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Abstract. Separation Logic with inductive definitions is a well-known 
approach for deductive verification of programs that manipulate dynamic 
data structures. Deciding verification conditions in this context is usu¬ 
ally based on user-provided lemmas relating the inductive definitions. 
We propose a novel approach for generating these lemmas automatically 
which is based on simple syntactic criteria and deterministic strategies 
for applying them. Our approach focuses on iterative programs, although 
it can be applied to recursive programs as well, and specifications that 
describe not only the shape of the data structures, but also their con¬ 
tent or their size. Empirically, we find that our approach is powerful 
enough to deal with sophisticated benchmarks, e.g., iterative procedures 
for searching, inserting, or deleting elements in sorted lists, binary search 
tress, red-black trees, and AVL trees, in a very efficient way. 


1 Introduction 

Program verification requires reasoning about complex, unbounded size data 
structures that may carry data ranging over infinite domains. Examples of such 
structures are multi-linked lists, nested lists, trees, etc. Programs manipulating 
such structures perform operations that may modify their shape (due to dynamic 
creation and destructive updates) as well as the data attached to their elements. 
An important issue is the design of logic-based frameworks that express asser¬ 
tions about program configurations (at given control points), and then to check 
automatically the validity of these assertions, for all computations. This leads to 
the challenging problem of finding relevant compromises between expressiveness, 
automation, and scalability. 

An established approach for scalability is the use of Separation logic 
(SL) [T8l[24]. Indeed, its support for local reasoning based on the “frame rule” 
leads to compact proofs, that can be dealt with in an efficient way. However, 
finding expressive fragments of SL for writing program assertions, that enable 
efficient automated validation of the verification conditions, remains a major 
issue. Typically, SL is used in combination with inductive definitions, which pro¬ 
vide a natural description of the data structures manipulated by a program. 

* Zhilin Wu is supported by the NSFC projects (No. 61100062, 61272135, and 
61472474), and the visiting researcher program of China Scholarship Council. This 
work was supported by the ANR project Vecolib (ANR-14-CE28-0018). 
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Moreover, since program proofs themselves are based on induction, using induc¬ 
tive definitions instead of universal quantifiers (like in approaches based on first- 
order logic) enables scalable automation, especially for recursive programs which 
traverse the data structure according to their inductive definition, e.g., [35]. Nev¬ 
ertheless, automating the validation of the verification conditions generated for 
iterative programs, that traverse the data structures using while loops, re¬ 
mains a challenge. The loop invariants use inductive definitions for fragments 
of data structures, traversed during a partial execution of the loop, and proving 
the inductiveness of these invariants requires non-trivial lemmas relating (com¬ 
positions of) such inductive definitions. Most of the existing works require that 
these lemmas be provided by the user of the verification system, e.g., [8l [T7l[22] 
or they use translations of SL to first-order logic to avoid this problem. However, 
the latter approaches work only for rather limited fragments HQIET]. In general, 
it is difficult to have lemmas relating complex user-defined inductive predicates 
that describe not only the shape of the data structures but also their content. 

To illustrate this difficulty, consider the simple example of a sorted singly 
linked list. The following inductive definition describes a sorted list segment 
from the location E to F, storing a multiset of values M: 

lseg{E, M, F) :;= E = FaM = 0 A emp ( 1 ) 

lseg{E, M, F) ■.:= 3 X, v, Mi . E i-^ {(next, X), (data, t)} >i= lseg{X, Mi,F) 

A V < Ml A M = Ml U {v} ( 2 ) 

where emp denotes the empty heap, E i—^ {(next,X), (data, u)} states that the 
pointer field next of E points to X while its field data stores the value v, and 
* is the separating conjunction. Proving inductive invariants of typical sorting 
procedures requires such an inductive definition and the following lemma: 

3E2. lseg{Ei, Mi, E2) * lseg{E2, M2, E3) A Mi < M2 ^ 3 M. lseg(Ei, M, E3). 

The data constraints in these lemmas, e.g.. Mi < M 2 (stating that every element 
of Ml is less or equal than all the elements of M2), which become more complex 
when reasoning for instance about binary search trees, are an important obstacle 
for trying to synthesize them automatically. 

Our work is based on a new class of inductive definitions for describing frag¬ 
ments of data structures that (i) supports lemmas without additional data 
constraints like Mi < M 2 and (ii) allows to automatically synthesize these 
lemmas using efficiently checkable, almost syntactic, criteria. For instance, we use 
a different inductive definition for Iseg, which introduces an additional parame¬ 
ter M' that provides a “data port” for appending another sorted list segment, 
just like F does for the shape of the list segment: 

lseg{E, M, F, M') E = F A M = M' A emp ( 3 ) 

lseg{E, M, F, M') ■.:= 3 X, v. Mi. E i-A {(next, X), (data, w)} * lseg{X, Mi,F, M') 

A V < Ml A M = Ml U {v} ( 4 ) 

The new definition satisfies the following simpler lemma, which avoids the in¬ 
troduction of data constraints: 

3E2, M2. lseg{Ei, Ml, E2, M2) * lseg{E2, M2, E3, M3) => lseg{Ei, Mi, E3, M3). ( 5 ) 
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Besides such “composition” lemmas (formally defined in Sec. |4]), we define (in 
Sec.[5|) other classes of lemmas needed in program proofs and we provide efficient 
criteria for generating them automatically. Moreover, we propose (in Sec. [6]) a 
proof strategy using such lemmas, based on simple syntactic matchings of spa¬ 
tial atoms (points-to atoms or predicate atoms like Iseg) and reductions to SMT 
solvers for dealing with the data constraints. We show experimentally (in Sec. Ill) 
that this proof strategy is powerful enough to deal with sophisticated bench¬ 
marks, e.g., the verification conditions generated from the iterative procedures 
for searching, inserting, or deleting elements in binary search trees, red-black 
trees, and AVL trees, in a very efficient way. The appendix contains the proofs 
of theorems and additional classes of lemmas. 


2 Motivating Example 

Fig. [T] lists an iterative implementation of a search procedure for binary search 
trees (BSTs). The property that E points to the root of a BST storing a multiset 
of values M is expressed by the following inductively-defined predicate: 

bst{E, M) i? = nil A M = 0 A emp 

bst{E, M) 3 X, Y, Ml, M2, v.E {(left, X), (right, Y), (data, v)} 

* bst{X,Mi)*bst{Y,M2) 

A M = {v} U Ml U M2 A Ml < t < M2 

The predicate bst{E,M) is defined by two 
rules describing empty (eq. ([6])) and non¬ 
empty trees (eq. ([T])). The body (right-hand 
side) of each rule is a conjunction of a pure 
formula, formed of (dis)equalities between lo¬ 
cation variables (e.g. E = nil) and data con¬ 
straints (e.g. M = 0), and a spatial formula 
describing the structure of the heap. The data 
constraints in eq. d?]) define M to be the mul¬ 
tiset of values stored in the tree, and state the 
sortedness property of BSTs. 

The precondition of search is 

bst{root, Mq), where Mq is a ghost vari¬ 
able denoting the multiset of values stored in the tree, while its postcondition 
is 6st(root,Mo) A (key e Mq —>■ ret = 1) A (key ^ Mq —)• ret = 0), where ret 
denotes the return value. 

The while loop traverses the BST in a top-down manner using the pointer 
variable t. This variable decomposes the heap into two domain-disjoint sub¬ 
heaps: the tree rooted at t, and the truncated tree rooted at root which contains 
a “hole” at t. To specify the invariant of this loop, we define another predicate 
bsthole{E, Mi, F, M 2 ) describing “truncated” BSTs with one hole F as follows: 


int searchCstruct Tree* root, 
int key) { 

struct Tree *t = root; 
while (t != NULL) { 
if (t->data == key) 
return 1; 

else if (t->data > key) 
t = t->left; 
else 

t = t->right; 

} 

return 0; 

^ Fig.l. 


( 6 ) 

(7) 
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bsthole{E, Mi, F, M2) :;= E — F A Mi = M2 A emp 



( 8 ) 


bst{X, Ms) * bsthole{Y, M4, F, M2) ( 9 ) 

A Ml = {n} U M3 U M4 A Ms < 1! < M4 



>1= bsthole{X, Ms, F, M2) * bst{Y, Mi) ( 10 ) 
A Ml = {ii} U M3 U M4 A M3 < 11 < M4 


Intuitively, the parameter M 2 , interpreted as a multiset of values, is used to 
specify that the structure described by bsthole{E, Mi, F, M2) could be extended 
with a BST rooted at F and storing the values in M 2 , to obtain a BST rooted at 
E and storing the values in Mi. Thus, the parameter Mi of bsthole is the union 
of M 2 with the multiset of values stored in the truncated BST represented by 
bsthole{E, Mi, F, M 2 ). 

Using bsthole, we obtain a succinct specification of the loop invariant: 

Inv 3 Mi. bsthole{root, Mo, t. Mi) * bst{t, Mi) A (key € Mq key £ Mi). ( 11 ) 

We illustrate that such inductive definitions are appropriate for automated 
reasoning, by taking the following branch of the loop: assume(t != NULL); 
assume(t->data > key); t' = t->left (as usual, if statements are trans¬ 
formed into assume statements and primed variables are introduced in assign¬ 
ments). The postcondition of Inv w.r.t. this branch, denoted post(Inv), is com¬ 
puted as usual by unfolding the bst predicate: 

3 Mi, Y, V, M2, Ms. bsthole{root. Mo, t. Mi) >i= t i-A {(left, t^), (right, Y), (data, u)} 


* bst{t', M2) * hst{Y, Ms) A Mi = {«} U M2 U Ms A M2 < v < Ms 


A (key G Mo key G Mi) Av > key. 


( 12 ) 


The preservation of Inv by this branch is expressed by the entailment 
post{Inv) => Inv', where Inv' is obtained from Inv by replacing t with t'. 

Based on the lemmas, this paper also proposes a deterministic proof strategy 



steps: (i) enumerating spatial atoms A from ip 2 , and for each of them, carving out 
a sub-formula ip a of ipi that entails A, where it is required that these subformulas 
do not share spatial atoms (due to the semantics of separation conjunction), and 
(ii) proving that the data constraints from ip a imply those from ip 2 (using SMT 
solvers). The step (i) may generate constraints on the variables in ip a and ip2 
that are used in step (ii). If the step (ii) succeeds, then the entailment holds. 

For instance, by applying this strategy to the entailment post(Inv) ^ Inv 
above, we obtain two goals for step (i) which consist in computing two sub¬ 
formulas of post{Inv) that entail 3M{. bstholeiroot., Mq, t', M{) and respectively, 
3M". bst{t', Ml). This renaming of existential variables requires adding the 
equality Mi = M{ = M” to Inv'. The second goal, for 3M". 6st(t',M{'), is 
solved easily since this atom almost matches the sub-formula bst{t',M 2 ). This 
matching generates the constraint M{' = M 2 , which provides an instantiation of 

® The existential quantifiers in ipi are removed using skolemization. 
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the existential variable M(' useful in proving the entailment between the data 
constraints in step (ii). 

Computing a sub-formula that entails 3M[. bsthole{root, Mo,t\ M[) re¬ 
quires a non-trivial lemma. Thus, according to the syntactic criteria defined 
in Sec. 01 the predicate bsthole enjoys the following composition lemma: 

{ 3 F, M. bstholeiroot, Mo,F, M) * bsthole{F, M, t', Mi)) ( 13 ) 

bsthole{root, Mo,t', Mi). 

Intuitively, this lemma states that composing two heap structures described by 
bsthole results in a structure that satisfies the same predicate. The particular 
relation between the arguments of the predicate atoms in the left-hand side is 
motivated by the fact that the parameters F and M are supposed to represent 
“ports” for composing bstholeiroot, Mq, F, M) with some other similar heap 
structures. This property of F and M is characterized syntactically by the fact 
that, roughly, F (resp. M) occurs only once in the body of each inductive rule of 
bsthole, and F (resp. M) occurs only in an equality with root (resp. Mq) in the 
base rule (we are referring to the rules diD-dini) with the parameters of bsthole 
substituted by [root, Mq, F, M)). 

Therefore, the first goal reduces to finding a sub-formula of post{Inv) that 
implies the premise of (11311 where remains existentially-quantified. Recur¬ 
sively, we apply the same strategy of enumerating spatial atoms and finding 
sub-formulas that entail them. However, we are relying on the fact that all the 
existential variables denoting the root locations of spatial atoms in the premise 
of the lemma, e.g., F in lemma (1131) . occur as arguments in the only spatial 
atom of the conclusion whose root location is the same as that of the conse¬ 
quent, i.e., bsthole{root, Mq, F, M) in lemma ([T^ . Therefore, the first sub-goal, 
3F, M. bsthole{root, Mq, F, M) matches the atom bsthole{root, Mq, t. Mi), un¬ 
der the constraint F = t AM = Mi. This constraint is used in solving the second 
sub-goal, which now becomes 3M{. bsthole{t,Mi,t',M[). 

The second sub-goal is proved by unfolding bsthole twice, using first the rule 
m and then the rule m, and by matching the resulting spatial atoms with 
those in post{Inv) one by one. Assuming that the existential variable Mi from 
Inv' is instantiated with M 2 from post{Inv) (fact automatically deduced in the 
first step), the data constraints in post (Inv) entail those in Inv'. This completes 
the proof of post{Inv) ^ Inv'. 

3 Separation Logic with Inductive Definitions 

Let LVar be a set of location variables, interpreted as heap locations, and DVar 
a set of data variables, interpreted as data values stored in the heap, (multi)sets 
of values, etc. In addition, let Var = LVar U DVar. The domain of heap locations 
is denoted by L while the domain of data values stored in the heap is generically 
denoted by D. Let be a set of pointer fields, interpreted as functions L ^ L, 
and T> a set of data fields, interpreted as functions L ^ D. The syntax of the 
Separation Logic fragment considered in this paper is defined in Tab. [T] 

Formulas are interpreted over pairs (s, h) formed of a stack s and a heap 
h. The stack s is a function giving values to a finite set of variables (location 
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Table 1. The syntax of the Separation Logic fragment 
X,Y,E£ LVar location variables p C [J" x LVar) U (7? x DVar) 

F £ Var* vector of variables P £V predicates 
X € Var variable A formula over data variables 

n ::= X — Y\ X^Y\A\n/\n pure formulas 

E ::= emp \ E p\ P{E, F) \ E * E spatial formulas 

p ::= nAE\ip\/(fi \ 3x. ip formulas 

or data variables) while the heap /i is a function mapping a finite set of pairs 
(£, p/), where £ is a location and pf is a pointer field, to locations, and a finite set 
of pairs (£, df), where df is a data field, to values in D. In addition, h satisfies 
the condition that for each £ € L, if (£, df) € dom(/i) for some df G V, then 
{i,pf) G dom(/i) for some pf G T. Let dom(/i) denote the domain of h, and 
Idom(Ii) denote the set of £ G L such that {£,pf) G dom(h) for some pf G T. 

Formulas are conjunctions between a pure formula 7T and a spatial formula 
E. Pure formulas characterize the stack s using (dis)equalities between location 
variables, e.g., a stack models x = y iS s{x) = s{y), and constraints A over 
data variables. We let A unspecified, though we assume that they belong to 
decidable theories, e.g., linear arithmetic or quantifier-free first order theories 
over multisets of values. The atom emp of spatial formulas holds iff the domain 
of the heap is empty. The points-to atom E i—5> {{fi,Xi)}i^i specifies that the 
heap contains exactly one location E, and for all i G I, the field fi of E equals 
i.e., h{s{E),fi) = s(xi). The predicate atom P{E,F) specifies a heap segment 
rooted at E and shaped by the predicate P; the fragment is parameterized by a 
set V of inductively defined predicates, formally defined hereafter. 

Let P G V. An inductive definition of P is a finite set of rules of the form 
P{E, F) ::= BZ.U A A, where Z G Var* is a tuple of variables. A rule R is called 
a base rule if S contains no predicate atoms. Otherwise, it is called an inductive 
rule. A base rule R is called spatial-empty E — emp. Otherwise, it is called a 
spatial-nonempty base rule. For instance, the predicate bst in Sec. [2] is defined 
by one spatial-empty base rule and one inductive rule. 

We consider a class of restricted inductive definitions that are expressive 
enough to deal with intricate data structures (see Sec. [H) while also enabling 
efficient proof strategies for establishing the validity of the verification conditions 
(see Sec. H]). For each rule R : P{E,F) ::= BZ.II A A in the definition of a 
predicate P{E, F) G V, we assume that: 

— If P is inductive, then E = Ei * E 2 and the following conditions hold: 

• the root atoms: Ei contains only points-to atoms and a unique points-to 
atom starting from E, denoted as E i-A- p. Also, all the location variables 
from Z occur in Ei. Ei is called the root of R and denoted by root{R). 

• connectedness: the Gaifman graph of Ai, denoted by is a connected 
DAG (directed acyclic graph) with the root E, that is, every vertex is 
reachable from E, 

• predicate atoms: A 2 contains only atoms of the form Q(Z,Z'), and for 
each such atom, A is a vertex in Gsi without outgoing arcs. 
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— If i? is a spatial-nonempty base rule, then E contains exactly one points-to 
atom E p, for some p. 

The classic acyclic list segment definition [24] satisfies these constraints as well as 
the first rule below; the second rule below falsifies the “root atoms” constraint: 
lsegeven{E, F) 3 X, Y. E ^ (next, X) X i—>■ (next, Y) * IsegeveniY, F) 

lsegb{E, F) 3 X. lsegb{E, X) * X i—(next,,?’). 

Since we disallow the use of negations on top of the spatial atoms, the semantics 
of the predicates in V is defined as usual as a least fixed-point. The class of 
inductive definitions defined above is in general undecidable, since with data 
helds, inductive definitions can be used to simulate two-counter machines. 

A variable substitution 77 is a mapping from a finite subset of Var to the set 
of terms over the respective domains. For instance, if X G LVar and v, vi G DVar 
be integer variables then the mapping ry = {X —>■ nil, u —>■ -I- 5} is a variable 
substitution. We denote by free{ijj) the set of free variables of a formula ip. 

4 Composition Lemmas 

As we have seen in the motivating example, the predicate bsthole{E, Mi, F, M 2 ) 
satisfies the property that composing two heap structures described by this pred¬ 
icate results in a heap structure satisfying the same predicate. We call this prop¬ 
erty a composition lemma. We define simple and uniform syntactic criteria which, 
if they are satisfied by a predicate, then the composition lemma holds. 

The main idea is to divide the parameters of inductively defined predicates 
into three categories: The source parameters a = {E,C), the hole parameters 
P = {F,H), and the static parameters ^ G Var*, where E,F € LVar are called 
the source and resp., the hole location parameter and C,H G DVar are called 
the cumulative and resp., the hole data parameteijj. 

Let 7^ be a set of inductively defined predicates and P GV with the param¬ 
eters (a, (3,^). Then P is said to be syntactically compositional if the inductive 
definition of P contains exactly one base rule, and at least one inductive rule, 
and the rules of P are of one of the following forms: 

— Base rule: P{d,P,^) ::= ai = Pi A a 2 = P 2 ^ emp. Note that here the 
points-to atoms are disallowed. 

— Inductive rule: F(a, P, ^ ) ::= 3Z. U A E, with (a) E = Ei* E 2 * Ppp, P, O, 
(b) El contains only and at least one points-to atoms, (c) E 2 contains only 
and possibly none predicate atoms, (d) 7 C Z, and (d) the variables in P do 
not occur elsewhere in 77AX, i.e., not in 77, or Xi, or X 2 , or 7 . Note that the 
inductive rule also satisfies the constraints “root atom” and “connectedness” 
introduced in Sec. [H In addition, X 2 may contain P atoms. 

One may easily check that both the predicate lseg{E, M, F, M') in eq. ©-(HI) 
and the predicate bsthole{E, Mi,F, M 2 ) in eq. ©" ([T0|) are syntactically compo¬ 
sitional, while the predicate lseg{E, M, F) in eq. ([I])-© is not. 

A predicate P G V with the parameters (a, P,^) is said to be semantically 
compositiona l if the entailment 3/3. P{d, P,£,) * P{P, 7 ,0 ^ ^(<5, 7 , C) holds. 
For simplicity, we assume that a and P consist of exactly one location parameter 
and one data parameter. 
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Theorem 1. Let V be a set of inductively defined predicates. If P & V is syn¬ 
tactically compositional, then P is semantically compositional. 

The proof of Thm. [T]is done (see [12]) by induction on the size of the domain of 
the heap structures. Suppose (s, h) |= P{d, P{f, 7, C )> then either s{d) = 

s{l3) or s{d) 7 ^ s{(3). If the former situation occurs, then {s,h) ^ P{d,f,f ) 
follows immediately. Otherwise, the predicate P{d,f,f,) is unfolded by using 
some inductive rule of P, and the induction hypothesis can be applied to a sub¬ 
heap of smaller size. Then {s,h) ^ P{d,'y ,^) can be deduced by utilizing the 
property that the hole parameters occur only once in each inductive rule of P. 

Remark 1. The syntactically compositional predicates are rather general in the 
sense that they allow nestings of predicates, branchings (e.g. trees), as well as 
data and size constraints. Therefore, composition lemmas can be obtained for 
complex data structures like nested lists, AVL trees, red-black trees, and so on. 
In addition, although lemmas have been widely used in the literature, we are 
not aware of any work that uses the composition lemmas as simple and elegant 
as those introduced above, when data and size constraints are included. 

5 Derived Lemmas 

Theorem [T] provides a mean to obtain lemmas for one single syntactically com¬ 
positional predicate. In the following, based on the syntactic compositionality, 
we demonstrate how to derive additional lemmas describing relationships be¬ 
tween different predicates (proofs are detailed in [l2]). We identify three cate¬ 
gories of derived lemmas: “completion” lemmas, “stronger” lemmas, and “static- 
parameter contraction” lemmas. Based on our experiences in the experiments 
(cf. Sec. 13) and the examples from the literature, we believe that the composition 
lemmas as well as the derived ones are natural, essential, and general enough 
for the verification of programs manipulating dynamic data structures. For in¬ 
stance, the “composition” lemmas and “completion” lemmas are widely used in 
our experiments, the “stronger” lemmas are used to check the verification con¬ 
ditions for rebalancing AVL trees and red-black trees. While “static parameter 
contraction” lemmas are not used in our experiments, they could also be useful, 
e.g., for the verification of programs manipulating lists with tail pointers. 

5.1 The “completion” lemmas 

We first consider the “completion” lemmas which describe relationships between 
incomplete data structures (e.g., binary search trees with one hole) and complete 
data structures (e.g., binary search trees). For example, the following lemma is 
valid for the predicates bsthole and bst: 

BF, M2. bsthole{E, Mi, F, M2) bsti^F, M2) => bst[E, Mi). 

Notice that the rules defining bst{E,M) can be obtained from those of 
bsthole{Ei, Ml, F, M 2 ) by applying the variable substitution rj = {F —>■ 
nil, M 2 —>-0} (modulo the variable renaming Mi by M). This observation is 
essential to establish the “completion lemma” and it is generalized to arbitrary 
syntactically compositional predicates as follows. 


IX 


Let P S P be a syntactically compositional predicate with the parameters 
(a, / 3 ,^), and P' € V a. predicate with the parameters (a, ^). Then P' is a 
completion of P with respect to a pair of constants c = C 1 C 2 , if the rules of 
P' are obtained from the rules of P by applying the variable substitution rj = 
{/3i —>■ Cl,/32 —t C 2 }. More precisely, 

— let Qfi = /3i A 02 = 1^2 A emp be the base rule of P, then P' contains only one 
base rule, that is, oi = ci A 02 = C2 A emp, 

— the set of inductive rules of P' is obtained from those of P as follows: Let 
P(a, /3, I*) ::= 3Z. 11 A Si * IJ 2 * P(7, /?, O be an inductive rule of P, then 
P'{d,^) ::= 3Z. n A Si * S 2 * P'{a is an inductive rule of P' (Recall 
that j3 does not occur in P, Si, S 2 , a)- 


Theorem 2. Let P{d,P,^) €P be a syntactically compositional predicate, and 
P'{d,^) € P. If P' is a completion of P with respect to c, then P'{d,f,) 

P{d, c, and 30. P{d, 0,0) * P'{0,0) ^ P'{d, 0) hold. 

5.2 The “stronger” lemmas 

We illustrate this class of lemmas on the example of binary search trees. 
Let natbsth{E, Ml, F, M 2 ) be the predicate defined by the same rules as 
bsthole{E,Mi,F,M 2 ) (i.e., eq. (l^- (fTOll L except that M 3 > 0 (M 3 is an exis¬ 
tential variable) is added to the body of each inductive rule (i.e., eq. ([9]) and 
(nnD). Then we say that natbsth is stronger than bsthole, since for each rule R' 
of natbsth, there is a rule R of bsthole, such that the body of R' entails the body 
of R. This “stronger” relation guarantees that the following lemmas hold: 

natbsth{E, Ml, F, M2) => bsthole{E, Mi, F, M2) 
3E2, M2. natbsth{Ei, Ml, E2, M2) ^ bsthole[E2, M2, E3, Ms) bsthole{Ei, Mi, Es, Ms). 


In general, for two syntactically compositional predicates P, P' € P with the 
same set of parameters (a,/3,^ ), P' is said to be stronger than P if for each 
inductive rule P'(a, P,^) ::= 3Z. W ASi* S 2 * P'(7, P,f), there is an inductive 
rule P{d,0,0) ::= 3Z. 11 A Si * S 2 * P{a,0,0) such that W ^ 11 holds. The 
following result is a consequence of Thm. [TJ 

Theorems. Let P{d,P,^),P'(d,P,f) G V be two syntactically composi¬ 
tional predicates. If P' is stronger than P, then the entailments P'{d,P ,^) => 

P(c5, 0,0 ) and 30. P'{d, 0,0) * P{0,1,0) ^ P{d, 7 , 0 ) hold. 

The “stronger” relation defined above requires that the spatial formulas in the 
inductive rules of P and P' are the same. This constraint can be relaxed by only 
requiring that the body of each inductive rule of P' is stronger than a formula 
obtained by unfolding an inductive rule of P for a bounded number of times. 
This relaxed constraint allows generating additional lemmas, e.g., the lemmas 
relating the predicates for list segments of even length and list segments. 
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5.3 The “static-parameter contraction” lemmas 

Let tailbsth{E,Mi,F,M 2 ) (resp. stabsth{E, Mi, F, M 2 , B)) be the pred¬ 
icate defined by the same rules as bsthole{E, Mi, F, M 2 ), with the 
modification that the points-to atom in each inductive rule is re¬ 
placed by i? i-7> {(left, X), (right, F), (tail, F), (data, u)} (resp. E i-5> 
{(left,X), (right, F), (tail, 5), (data, u)}). Intuitively, tailbsth (resp. stabsth) 
is obtained from bsthole by adding a tail pointer to F (resp. B). Then tailbsth 
is not syntactically compositional since F occurs in the points-to atoms of the 
inductive rules. On the other hand, stabsth is syntactically compositional. 

From the above description, it is easy to observe that the inductive definition 
of tailbsth{E, Mi, F, M 2 ) can be obtained from that of stabsth{E, Mi, F, M 2 , B) 
by replacing B with F. Then the lemma tailbsth{E,Mi,F,M 2 ) 
stabsth{E, Mi, F, M 2 , F) holds. From this, we further deduce the lemma 

3 F 2 , M 2 . stabsth [El, Mi, E 2 , M 2 , E 3 ) * tailbsth[E 2 , M 2 , E 3 , M 3 ) => 

tailbsth{Ei, Ml, E 3 , M 3 ). 

We call the aforementioned replacement of F by F in the inductive definition of 
stabsth as the “static-parameter contraction”. This idea can be generalized to 
arbitrary syntactically compositional predicates as follows. 

Let F G F be a syntactically compositional predicate with the parameters 
{d,(3,^), P' G V be an inductive predicate with the parameters (a,/3,^'), ^ = 
^i.. .^k, and = ^'i.. Then P' is called a static-parameter contraction of 
P if the rules of P' are obtained from those of F by a variable substitution rj 
s.t. dom(? 7 ) = for each i ■. 1 < i < k, either ^{^i) = ^i, or r]{^i) = (ij for 
some j = 1,2 satisfying that and Pj have the same data type, and is the 
tuple obtained from r]{^ ) by removing the Pj’s. The substitution 77 is called the 
contraction function. 

Theorem 4. Let P{d, /3, ^) G F be a syntactically compositional predicate and 
P'{d, p,f,') GP be an inductive predicate. If P' is a static-parameter contraction 
of P with the contraction function ij, then P'(a, P,f') P(a, P,r]{f )) and 

30. p[d, 0, r){0))* P'(0, 7 , 0) ^ p'(d, 7 , 0) hold. 

Remark 2. The lemmas presented in the last two sections are incomplete in 
the sense that they may not cover all the lemmas for a given set of inductive 
predicates. Although various extensions of the lemmas are possible, generating 
all the possible lemmas can be quite complex in general. Thm. 3 in m shows 
that generating all the lemmas is at least EXPTIME-hard, even for a fragment 
restricted to shape properties, without any data or size constraint. 

6 A Proof Strategy Based on Lemmas 

We introduce a proof strategy based on lemmas for proving entailments ipi => 
3 X.(p 2 , where ipi, ip 2 are quantifier-free, and X G DVar*. The proof strategy 
treats uniformly the inductive rules defining predicates and the lemmas defined in 
Sec.llHSl Therefore, we call lemma also an inductive rule. W.l.o.g. we assume that 
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/.r ^Si=e{E2) 77 = 01 ^ 77iAEQ(77)^EQ(0|,^^^(3^^)) 

(MatchI) -r;-- — 

77i a |=5C'B ax. Aa 


77i A Ai 3A. A 2 

(Match2) ^ 

77i A Ai 3A. A 2 

fLEllll • ^ root{L) Til A E'l 3Z". 771(77 A S) 

77i A Ai E[ 3A. A 

— L ■.:= 3Z. 77 A root{L) * A => A is a lemma, 

— Z' = (A U Z) n free(root(L)), = (A U Z) n free(77i(77 A A)), 

— 77 = extn(gi U 772 ) is the extension of 771 U 772 with U s.t. dom(77) = A U Z. 

f'^LlCEl ^ 3Z'.A III A A 2 1=7,2 3^". A III A EQ(77) |= 772 

77i A Ai * A 2 1=7, 3A. 772 A j4 * A 

— Z' = A n free(A), Z" = A fl free(A), 

— 77 = extnfigi U 772 ) is the extension of 771 U 772 with 772 s.t. dom(? 7 ) = A. 


Fig. 2. The proof rules for checking the entailment ipi ^ 3X. 


ipi is quantifier-free (the existential variables can be skolemized). In addition, 
we assume that only data variables are quantified in the right-hand 

W.l.o.g., we assume that every variable in X occurs in at most one spatial 
atom of (multiple occurrences of the same variable can be removed by intro¬ 
ducing fresh variables and new equalities in the pure part). Also, we assume that 
ipi and are of the form 77 A A. In the general case, our proof strategy checks 
that for every disjunct of pi, there is a disjunct p '2 of p 2 s.t. p'l => 3X.p'2. 

We present the proof strategy as a set of rules in Fig. [H For a variable 
substitution t] and a set X C Var, we denote by ri\x the restriction of t] to 
X. In addition, EQ(77) is the conjunction of the equalities X = t ior every X 
and t such that 77 (A) = t. Given two formulas pi and p 2 -, a substitution 77 
with dom(77) = A, the judgement pi 3X.p2 denotes that the entailment 
Pi => il{p 2 ) is valid. Therefore, 77 provides an instantiation for the quantified 
variables A which witnesses the validity. 

The rules Match 1 and Match2 consider a particular case of \= ri ^ denoted 
using the superscript SUB, where the spatial atoms of p2 are syntactically 
matchecil to the spatial atoms of pi modulo a variable substitution 9 . The 
substitution of the existential variables is recorded in 77 , while the substitu¬ 
tion of the free variables generates a set of equalities that must be implied by 
III A EQ( 77 ). For example, let 7Ti A Si ::= w = w' /\E ^ {(/, F), {di,v), (d2,w)}, 
and 3A. S2 ::= 3A, vI E i-A {(/, A), {di,v'), {d2,w')}, where di and d2 are data 
fields, li 9 = {X -^Y,v'^ v,w' — >■ w}, then Si = ^(^ 2 ). The substitution of 

® We believe that this restriction is reasonable for the verification conditions appearing 
in practice and ail the benchmarks in our experiments are of this form. 

® In this case, the right-hand side contains no pure constraints. 
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the free variable w' from the right-hand side is sound since the equality w = w' 
occurs in the left-hand side. Therefore, Ui A Si \=aV^ 3X,v'. S 2 holds. 

The rule Lemma applies a lemma L ::= 3Z. U Aroot{L) * T" yl. It consists 
in proving that ipi implies the LHS of the lemma where the variables in X 
are existentially quantified, i.e., 3X3Z. 11 A root{L) * S. Notice that Z may 
contain existential location variables. Finding suitable instantiations for these 
variables relies on the assumption that root{L) in the LHS of L is either a unique 
predicate atom or a separating conjunction of points-to atoms rooted at E (the 
first parameter of A) and root{L) includes all the location variables in Z. This 
assumption holds for all the inductive rules defining predicates in our fragment 
(a consequence of the root and connectedness constraints) and for all the lemmas 
defined in Sec. BHll The proof that ipi implies 3X3Z. 11 A root{L) = 1 = if is split 
into two sub-goals (i) proving that a sub-formula of (pi implies 3X3Z. root{L) 
and (ii) proving that a sub-formula of pi implies 3X3Z. U AS. The sub-goal (i) 
relies on syntactic matching using the rule Match 1, which results in a quantifier 
instantiation rji. The substitution rji is used to instantiate existential variables 
in 3X3Z. n A S. Notice that according to the aforementioned assumption, 
the location variables in Z are not free in rji{n A if). Let 772 be the quantifier 
instantiation obtained from the second sub-goal. The quantifier instantiation ry 
is defined as the extension of 771 U m to the domain X \J Z hy utilizing the 
pure constraints II from the lemmelj. This extension is necessary since some 
existentially quantified variables may only occur in 7T, but not in root{L) nor 
in S, so they are not covered by rji U 772 . For instance, if II contains a conjunct 
M = Ml U M2 such that Mi G dom( 77 i), M2 € dom( 772 ), and M ^ dom( 77 i U 772), 
then ?7i U 772 is extended to 77 where r]{M) = t]i{Mi) U ri2{M2). 

The rule Slice chooses a spatial atom A in the RHS and generates two 
sub-goals: (i) one that matches A (using the rules Match2 and Lemma) with 
a spatial sub-formula of the LHS (ifi) and (ii) another that checks that the 
remaining spatial part of the RHS is implied by the remaining part of the LHS. 
The quantifier instantiations rji and 772 obtained from the two sub-goals are used 
to check that the pure constraints in the RHS are implied by the ones in LHS. 
Note that in the rule Slice, it is possible that S 2 = S = emp. 

The rules in Fig. [2] are applied in the order given in the figure. Note that they 
focus on disjoint cases w.r.t. the syntax of the RHS. The choice of the atom A in 
Slice is done arbitrary, since it does not affect the efficiency of proving validity. 

We apply the above proof strategy to the entailment ifi ^ 3M. (p 2 where: 

<pi ::= Xi / nil AX 2 A nil A fi < 772 A Xi i-A {(next, *2), (data, ti)} 

* *2 {(next, nil), (data, 772)} 

P2 ::= lseg{xi, M, nil, 0 ) A 772 £ M, 


^ The extension depends on the pure constraints II and could be quite complex in gen¬ 
eral. In the experiments of Sec. [71 we use the extension obtained by the propagation 
of equalities in 77. 
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and Iseg has been defined in Sec. [T] (eq. ©-(H])). The entailment is valid be¬ 
cause it states that two cells linked by next and storing ordered data val¬ 
ues form a sorted list segment. The RHS ip 2 contains a single spatial atom 
and a pure part so the rule Slice is applied and it generates the sub-goal 
(pi \=rj 3M. faeg(a;i, M, nil, 0) for which the syntactic matching (rule MatchI) 
can not be applied. Instead, we apply the rule Lemma using as lemma the induc¬ 
tive rule of Iseg, i.e., eq. (H]) (pageHII- We obtain the RHS 3M, X, Mi,v. xi ^ 
{(next, X), (data, u)} * lse 3 (X, Ml, nil, 0) A M = {v} U Mi A v < Mi, where 
Xi i-A- {(next, X), (data, u)} is the root. The rule MatchI is applied with 
III A Si '■'■= Xi ^ nil A X2 ^ nil A < W2 A a;i i—{(next, a;2), (data, iii)} 
and it returns the substitution iji = {X -A X 2 ,v -A ti}. The second sub-goal is 

111 A S2 \=r]2 3 M, Mi.if)' where IIi A S2 ::= Xi ^ nil AX2 ^ nil A ui <V2 Ax2^ 
{(next, nil), (data, U2)} and i/;' ::= M = {ui}UMiAui < MiAlseg{x2, Mi, nil, 0 ). 
For this sub-goal, we apply the rule Slice, which generates a sub-goal where the 
rule Lemma is applied first, using the same lemma, then the rule Slice is ap¬ 
plied again, and finally the rule Lemma is applied with a lemma corresponding 
to the base rule of Iseg, i.e., eq. ([3]) (pagelll- This generates a quantifier instan¬ 
tiation r ]2 = {M -A {vi,V 2 }, Ml -A {u2}}- Then, 771 U 172 is extended with the 
constraints from the pure part of the lemma, i.e., M = {u} U Mi A ui < Mi. 
Since M € dom(?7i U 772), this extension has no effect. Finally, the rule Slice 
checks that 7 Ti A EQ(?7|{jv^}) |= II2 holds, where EQ(77|{j\^j) ::= M = {vi,V2} and 

11 2 ::= V 2 € M. The last entailment holds, so the proof of validity is done. 

The following theorem states the correctness of the proof rules. Moreover, 

since we assume a finite set of lemmas, and every application of a lemma L 
removes at least one spatial atom from (pi (the atoms matched to root{L)), the 
termination of the applications of the rule Lemma is guaranteed. 

Theorem 5. Let ipi and 3X.ip2 be two formulas such that X contains only data 
variables. If ipi \=rj 3X.(p2 for some 77 , then (pi ^ 3X.(p2. 

7 Experimental results 

We have extended the tool SPEN with the proof strategy proposed in this 
paper. The entailments are written in an extension of the SMTLIB format used in 
the competition SL-COMPT4 for separation logic solvers. It provides as output 
SAT, UNSAT or UNKNOWN, and a diagnosis for all these cases. 

The solver starts with a normalization step, based on the boolean abstrac¬ 
tions described in m, which saturates the input formulas with (dis)equalities 
between location variables implied by the semantics of separating conjunction. 
The entailments of data constraints are translated into satisfiability problems in 
the theory of integers with uninterpreted functions, discharged using an SMT 
solver dealing with this theory. 

We have experimented the proposed approach on two sets of benchmark^: 
RDBI: verification conditions for proving the correctness of iterative procedures 
(delete, insert, search) over recursive data structures storing integer data: 
sorted lists, binary search trees (BST), AVL trees, and red black trees (RBT). 


http://www.liafa.univ-paris-diderot.fr/spen/benchmarks.html 
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Table 2. Experimental results on benchmark RDBI 


Data structure 

Procedure 

#vc 

Lemma 

(#b. #r, #p, #c, #d) 


Tim 

SPEN 

2 (s) 

SMT 

sorted lists 

search 

4 

(1, 3, 3, 1, 3) 

5 

1.108 

0.10 


insert 

8 

(4, 6, 3, 1, 2) 

7 

2.902 

0.15 


delete 

4 

(2, 2, 4, 1, 1) 

6 

1.108 

0.10 

BST 

search 

4 

(2, 3, 6, 2, 2) 

6 

1.191 

0.15 


insert 

14 

(15, 18, 27, 4, 6) 

19 

3.911 

0.55 


delete 

25 

(13, 19, 82, 8, 5) 

23 

8.412 

0.58 

AVL 

search 

4 

(2, 3, 6, 2, 2) 

6 

1.573 

0.15 


insert 

22 

(18, 28, 74, 6, 8) 

66 

6.393 

1.33 

RBT 

search 

4 

(2, 3, 6, 2, 2) 

6 

1.171 

0.15 


insert 

21 

(27, 45, 101, 7, 10) 

80 

6.962 

2.53 


Table 3. Experimental results on benchmark SL-COMP’14 


Data structure 

#vc 

Lemma 

(#b, #r, #p, #c, #d) 

Time 

SPEN 

-spen(s) 

spen-TA 

Nested linked lists 

16 

(17,47,14,8,0) 

4.428 

4.382 

Skip lists 2 levels 

4 

(11,16,1,1,0) 

1.629 

1.636 

Skip lists 3 levels 

10 

(16,32,29,17,0) 

3.858 

3.485 


SL-COMP’14: problems in the SL-COMP’14 benchmark, without data con¬ 
straints, where the inductive definitions are syntactically compositional. 

Tab. [5] provides the experiment result^ for RDBI. The column #VC gives the 
number of verification conditions considered for each procedure. The column 
Lemma provides statistics about the lemma applications as follows: #b and #r 
are the number of the applications of the lemmas corresponding to base resp. 
inductive rules, ^c and ^d are the number of the applications of the composition 
resp. derived lemmas, and #p is the number of predicates matched syntactically, 
without applying lemmas. Column =>d gives the number of entailments between 
data constraints generated by SPEN. Column Time-SPEN gives the “system” time 
spent by SPEN on all verification conditions of a functior0 excepting the time 
taken to solve the data constraints by the SMT solver, which is given in the 
column Time-SMT. 

Tab. [3] provides a comparison of our approach (column SPEN) with the de¬ 
cision procedure in [11] (column SPEN-TA) on the same set of benchmarks from 
SL-COMP’14. The times of the two decision procedures are almost the same, 
which demonstrates that our approach, as an extension of that in is robust. 

8 Related work 

There have been many works on the verification of programs manipulating mu¬ 
table data structures in general and the use of separation logic, e.g., HHaizi- 

® The evaluations used a 2.53 GHz Intel processor with 2 GB, running Linux on VBox. 
SPEN does not implement a batch mode, each entailment is dealt separately, including 
the generation of lemma. The SMT solver is called on the files generated by SPEN. 
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[IIlIlaHIIlEIlEaEe]. In the following, we discuss those which are closer to our 
approach. 

The prover SLEEK [ZlIT^ provides proof strategies for proving entailments 
of SL formulas. These strategies are also based on lemmas, relating inductive 
definitions, but differently from our approach, these lemmas are supposed to 
be given by the user (SLEEK can prove the correctness of the lemmas once 
they are provided). Our approach is able to discover and synthesize the lemmas 
systematically, efficiently, and automatically. 

The natural proof approach DRYAD mill] can prove automatically the 
correctness of programs against the specifications given by separation logic for¬ 
mulas with inductive definitions. Nevertheless, the lemmas are still supposed to 
be provided by the users in DRYAD, while our approach can generate the lem¬ 
mas automatically. Moreover, DRYAD does not provide an independent solver 
to decide the entailment of separation logic formulas, which makes difficult to 
compare the performance of our tool with that of DRYAD. In addition, the in¬ 
ductive definitions used in our paper enable succinct lemmas, far less complex 
than those used in DRYAD, which include complex constraints on data variables 
and the magic wand. 

The method of cyclic proofs introduced by [S] and extended recently in 
proves the entailment of two SL formulas by using induction on the paths of proof 
trees. They are not generating the lemma, but the method is able to (soundly) 
check intricate lemma given by the user, even ones which are out of the scope 
of our method, e.g., lemmas concerning the predicate RList which is defined by 
unfolding the list segments from the end, instead of the beginning. The cyclic 
proofs method can be seen like a dynamic lemma generation using complex 
reasoning on proof trees, while our method generates lemma statically by simple 
checks on the inductive definitions. We think that our lemma generator could 
be used in the cyclic proof method to cut proof trees. 

The tool SLIDE [Ml [15] provides decision procedures for fragments of SL 
based on reductions to the language inclusion problem of tree automata. Their 
fragments contain no data or size constraints. In addition, the EXPTIME lower 
bound complexity is an important obstacle for scalability. Our previous work m 
introduces a decision procedure based on reductions to the membership problem 
of tree automata which however is not capable of dealing with data constraints. 

The tool GRASShopper [21] is based on translations of SL fragments to 
first-order logic with reachability predicates, and the use of SMT solvers to deal 
with the latter. The advantage is the integration with other SMT theories to 
reason about data. However, this approach considers a limited class of inductive 
definitions (for linked lists and trees) and is incapable of dealing with the size 
or multiset constraints, thus unable to reason about AVL or red-black trees. 

The truncation point approach m provides a method to specify and verify 
programs based on separation logic with inductive definitions that may specify 
truncated data structures with multiple holes, but it cannot deal with data con¬ 
straints. Our approach can also be extended to cover such inductive definitions. 
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9 Conclusion 

We proposed a novel approach for automating program proofs based on Sepa¬ 
ration Logic with inductive definitions. This approach consists of (1) efficiently 
checkable syntactic criteria for recognizing inductive definitions that satisfy cru¬ 
cial lemmas in such proofs and (2) a novel proof strategy for applying these 
lemmas. The proof strategy relies on syntactic matching of spatial atoms and on 
SMT solvers for checking data constraints. We have implemented this approach 
in our solver SPEN and applied it successfully to a representative set of examples, 
coming from iterative procedures for binary search trees or lists. 

In the future, we plan to investigate extensions to more general inductive 
definitions by investigating ideas from [9ll22) to extend our proof strategy. From 
a practical point of view, apart from improving the implementation of our proof 
strategy, we plan to integrate it into the program analysis framework Celia [^. 
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A Proofs in Sec. [4] 

Theorem [T] Suppose that V is a set of inductively defined predicates. If P GV 
is syntactically compositional, then P is semantically compositional. 

Proof. Suppose P is syntactically compositional and has parameters (a,/ 3 ,^ ). 
It is sufficient to prove the following claim. 

For each pair (s, h), if (s, h) \= P{ai,d2, f) * P(d2, as, C')i then (s, h) |= 

We prove the claim by induction on the size of Idom(h). 

Suppose for each i : 1 < i < cli = Ei Vi, where Ei and Vi are respectively 
location and data variables. 

Since {s,h) \= P{di,d2,f,') * P(d2,a3,^')) there are hi,h2 such that h = 

hi * /i2, (s, hi) ^ P(di, 02, and (s, /12) |= ^(02, as, f')- 
2 

If (s, hi) ^ /\ ai^i = 02,i A emp, then Idom(hi) = 0 , and ^2 = h. From this, 

i=l 

we deduce that (s,h) |= P(di,03,^'). 

Otherwise, there are a recursive rule of P, say P{d,f 3 ,^ ) ::= 3 X. 11 A 
El * E2 * ), and an extension of s, say s', such that (s', hi) \= 11 ' A 

E'^ * E2 * P(7',d2,C'), where U', E'^, E'2,j' are obtained from 11 , Ei, E2,j by 
replacing cf, fi, f, with oTi, 02, f respectively. From this, we deduce that there are 
^1.1, hi,2, hi.3 such that hi = hi.i =1= hi,2 * hi.3, (s', hi.i) |= (s', hi,2) ^ E'2, 

and (s',hi.3) |= P(,fi,dL2,i'). Then (s',hi.3 * ha) h -P( 7 ',a 2 ,f) * -P(a2, (S3, f )• 
From the induction hypothesis, we deduce that (s', hi.3 * ha) |= P(7', (S3, ^'). 
Then (s', hi.i * hi.2 * hi.3 * ha) \= II' A E'^ * E'2 * P(7', (S3, ^'). We then deduce 
that (s, h) 1 = 3 X. 11 ' A E'^ * E2 * P(7', (S3, ^). 

To prove (s, h) |= P((il, 53, ^'), it is sufficient to prove that (s, h) |= 3 X. 11 "A 
Ef * E'f * P(7", (S3, ^'), where 11 ", E'{, E'f, 7" are obtained from 11 , Ei, E2 ,7 by 
replacing a, with di,d3,(,' respectively. 

From the fact that no variables from fi occur in 11 , Ei, E2, or 7, we know 
that n" = W, E'{ = E'l, E'f = E'2, and 7" = 7'. Since (s, h) |= 3 X. 11 ' A E'l * 
E'2 * Pfi)',d^,^'), we have already proved that (s,h) |= 3 X. 11 " A E'f * E'f * 
P{l", (Ss, ^'). The proof is done. □ 

B Proofs in Sec. [5] 

Theorem Let P G V be a .syntactically compositional predicate with the 
parameters (d, fi, ^), and P' G P with the parameters (( 3 , ^ ). If P' is a completion 
of P with respect to c, then P'{d, ^) P((S, c, ^) and 3 / 3 . P((S, / 3 , ^ )*P'(/ 3 , ^) => 

P'{d, 5 ) hold. 
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Proof. The fact P'{d,^) P(d,c,ff) can be proved easily by an induction on 
the size of the domain of the heap structures. 

The argument for 3/3. P{d, /3, ff)*P'{j5, £f) P'{d., £f) goes as follows: Suppose 

(s, h) 1= P{d,f), ^)*P'(/3, f). Then there are hi, ^2 such that h = hi*h 2 , (s, hi) |= 
P{d,(],^), and (s, ^ 2 ) ^ P'{P,^). From the fact that P'(/3, |) P0,c,^), 

we know that (s,h 2 ) |= P(/3, c, ^). Therefore, (s, h) |= P{d,P,^) * P{i3,c,^). 
From TheoremlH we deduce that (s, h) \= P{d, c, f). From the fact P{d, c, 
P'{d,^), we conclude that (s,h) |= P'{d,^). □ 

Theorem [3l Let P,P' GV be two syntactically compositional inductively defined 
predicates with the same set of parameters {d,l3,^ ). If P' is stronger than P, 
then the entailment P'{d, /3, ^ P^d, /3, and 3^. P'{d, /3, ^ )*P0, 7 , 

P{d, 7 , f ) hold. 

Proof. We first show that P'{d,P,f, ) P{d,l3,^ ). By induction on the size 

of Idom(h), we prove the following fact: For each (s,h), if {s,h) \= P'{d,f),^ ), 

then (s, h) \= P{d, /3, ^). 

Suppose (s, h) \= P'{d, P,i). 

2 

If (s, h) \= l\ Ui = Pi A emp, since P' and P have the same base rule, we 

i=l 

deduce that (s,h) |= P{d,P,^ ). 

Otherwise, there are a recursive rule of P', say P'{d, /3, ^ ) ::= 3X. 11' A Si* 
S 2 * P'{^, /3,0) s-iid an extension of s, say s', such that (s', h) \= PP A Si* S 2 * 
P'( 7 ,/3,^ ). Then there are hi,/i 2 ,h 3 such that h = hi * h 2 * h^, {s',hi) |= Si, 
(s',/ 12 ) 1= S 2 , and (s',/ 13 ) 1= P'( 7 ,/3,^ ). From the induction hypothesis, we 
deduce that (s', / 13 ) ^ P( 7 , /3, ^ ). Moreover, from the assumption, we know that 
there is a recursive rule of P of the form P{d,P,^ ) ::= 3X. 11 A Si * S 2 * 
P{-f,i3,^), such that 11' ^ 11 holds. Then it follows that (s', hi * h 2 * h^) \= 
nAS i*S 2 *P{a, )■ We then deduce that (s,h) \= 3X. 11 ASi*S 2 *P{'f, P,^)- 

From this, we conclude that (s,h) \= P{d,P,^ ). 

We then prove the second claim of the theorem. 

From the argument above, we know that P'{d,p,^ ) => P{d,P,$^ ) holds. 
Then P'{d,P,^ ) * P{P,j,^ ) ^ P{d,P,^ ) * P{P,p,^ ) holds. In addition, 
from Theorem m we know that P{d,P,i ) * P{P,p,£, ) ^ P{d,p,f, ) holds. 
Therefore,we conclude that P'{d,P,f ) * P(/3, 7 , ^ P{d,p ,^). □ 

Theorem Let P G V be a syntactically compositional predicate with the pa¬ 
rameters (a, P, ^) and P' € V be an inductive predicate with the parameters 
{d,P,^'). If P' is a static-parameter contraction of P with the contraction func¬ 
tion T], then P'{d, P,^') P{d,P,rj{^)) and 3/3. P{d,P,rj{f )) * P'(P,j,^') => 
P'(d,p,^') hold. 

Proof. The first claim can be proved by induction on the size of the domain of 
the heap structures. 
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The argument for the second claim goes as follows: From the fact that 
P' 0 , 7, i') 7 ^ PiP, 7, r] 0 )), we deduce that 

3 / 3 . P{d,P,r] 0 )) * P' 0 , j,^') => P{d,P,r] 0 ))*P 0 ,j,r] 0 )). 

From Theorem [T 1 we know that 

Pia, P, V 0 )) * P 0 , 7 , V 0 )) PiP, 7 , V 0 ))• 

Then the second claim follows from the fact P{d, 7, ri{^ )) P'{d, 7, □ 

C Extensions of the lemmas 

In this section, we discuss how the the basic idea of syntactical compositionality 
can be extended in various ways. 

C.l Multiple location and data parameters 

At first, we would like to emphasize that although we restrict our discussions on 
compositional predicates P{d,P,^) to the special case that a (resp. P) contain 
only two parameters: one location parameter, and one data parameter. But all 
the results about the lemmas can be generalized smoothly to the situation that 
d and P contain multiple location and data parameters. 

C.2 Pseudo-composition lemmas 

We then consider syntactically pseudo-compositional predicates. 

We still use the binary search trees to illustrate the idea. 

Suppose neqbsthole is the predicate defined by the same rules as bsthole, 
with the modification that E ^ F is added to the body of each inductive rule. 
Then neqbsthole is not syntactically compositional anymore and the composition 
lemma 

3E2,M2. neqbsthole{Ei, Ml, E2, M2) * neqbsthole{E2, M2, E3, M3) 

neqbsthole{Ei, Ml, E3, M3) 

does not hold. This is explained as follows: Suppose h = hi*h2 (where h = hi*h2 
denotes that hi and /12 are domain disjoint and h is the union of hi and /12), 
(s, hi) \= neqbsthole{Ei, Ml, E2, M2) and (s, /12) ^ neqbsthole{E2, M2, A3, M3), 
in addition, both ldom(/ii) and ldom(/i2) are nonempty. Then from the inductive 
definition of neqbsthole, we deduce that s(Ei) ^ 5(^2) and 5(^2) ^ 5(^3). On 
the other hand, {s,h) \= bstholel{Ei, Mi, E3, M3) requires that s{Ei) ^ 5(^3), 
which cannot be inferred from s{Ei) ^ 3(^2) and 5(^2) ^ 5(^3) in general. 
Nevertheless, the entailment 

3E2,M2. neqbsthole{Ei, Ml, E2, M2) * neqbsthole{E2, M2, E3, M3) * 

E3 I—>■ {{left, X), (right, Y), {data, v)) ^ 

neqbsthole{Ei, Ml, E3, M3) * E3 1-^ {{Pft, X), {right, Y), {data, v)) 
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holds since the information Ei ^ can be inferred from the fact that E^ is 
allocated and separated from Ei. Therefore, intuitively, in this situation, the 
composition lemma can be applied under the condition that we already know 
that El ^ Es. We call this as pseudo-compositionality. Our decision procedure 
can be generalized to apply the pseudo-composition lemmas when proving the 
entailment of two formulas. 


C.3 Data structures with parent pointers 

Next, we show how our ideas can be generalized to the data structures with 
parent pointers, e.g. doubly linked lists or trees with parent pointers. We use 
binary search trees with parent pointers to illustrate the idea. We can define the 
predicates prtbst{E, Pr, M) and prtbsthole{E, Pn, Mi, F, Pr2, M2) to describe 
respectively binary search trees with parent pointers and binary search trees 
with parent pointers and one hole. The intuition of E, F are still the source 
and the hole, while Pr and Pri (resp. Pr2) are the parent of E (resp. F) (the 
definition of prtbst is omitted here). 

prtbsthole{E, Pri,Mi, F, Pr2, M2) '■'■= E = F A emp A Pri = Pr2 A Mi = M2 
prtbsthole{E, Pri, Mi, F, Pr2, M2) ::= 3 X, Y, M3, M4, v. 

E I—>■ {{left, X), {right, Y), {parent, Pri), {data, n)} 

* prtbst{X, E, M3) * prtbsthole{Y, E, M4, F, Fr2, M2) 
A Ml = {ti} U M3 U M4 A M3 < n < M4 

prtbsthole{E, Pri, Mi, F, Pr2, M2) ■'■= 3 X, Y, M3, M4, v. 

E i-^- {{left, X), {right, Y), {parent, Pri), {data, n)} 

* prtbsthole{X, E, M3, F, Pr2, M2) * prtbstiY, E, M4) 
A Ml = {w} U M3 U M4 A M3 < n < M4 

Then the predicate prtbsthole enjoys the composition lemma 

3E2,Pr2, M2. prtbsthole{Ei, Pn, Mi, E2, Pr2, M2) * 
prtbsthole{E2, Pr2, M2, E3, Pr^, M3) 

prtbsthole{Ei, Pri, Mi, F3, Pr^, M3). 


C.4 Points-to atom in base rules 

Finally, we discuss the constraint that the base rule of a syntactically composi¬ 
tional predicate has an empty spatial atom. We use the predicates Isegeven and 
Isegodd to illustrate the idea. 

lsegeven{E, F) ::= E = F A emp, 

lsegeven{E, F) ::= 3 X, Y. E {next, X) * X ^ {next, Y) * lsegeven{Y, F). 

The definition of lsegodd{E, F) can be obtained from that of lsegeven{E, F) by 
replacing the base rule with the rule lsegodd{E, F) ::= E >->• {next, F). The only 
difference between the inductive definition of Isegeven and and that of Isegodd 
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is that Isegeven has an empty base rule, while Isegodd does not. From this, we 
deduce that 

lsegodd{E, F) 3 X. E i—>■ {{next, X)} =i= lsegeven{X, F). 

This idea can be generalized to arbitrary syntactically compositional predicates. 

D Full example of Sec. [6] 

We provide here the full details of the example considered in Section |6l 

Consider the following entailment which states that two cells linked by the 
next pointer field, and storing ordered data values, form a sorted list segment: 
ifi ::= xi / nil A 2:2 / nil A < W2 A xi i-A {(next, *2), (data, m)} 

* 2:2 i-A {(next, nil), (data, «2)} 
ip2 3 M. lseg{x\, M, nil, 0 ) A W2 £ M, 

where Iseg has been defined in Sec. [T](eq. (| 31 )~(| 11 ))- 
For convenience, let 

7Ti ::= xi ^ nil t\xi^ nil A ui < vi, 

El ::= xi :—>■ {(next, X 2 ), (data, ui)} * X 2 ^ {(next, nil), (data, U 2 )}, 

7T2 ::= V 2 € M, 

E2 ::= lseg{xi,M,n\\,<l}). 

The first application of the rule Slice. Since the right-hand side contains a 
single spatial atom, the rule Slice generates a sub-goal UiAEi 3 M. S2. For 
the sub-goal, the syntactic matching (rule MatchI) cannot be applied. Instead, 
we apply the rule Lemma using a lemma L that corresponds to the inductive 
rule of Iseg, i.e., eq. { 4 ]) (nagellll): 

L ::= 3 X, Mi,v. Xi {(next, X), (data, u)} = 1 = lseg{X, Mi, nil, 0)A 

M = {u} U Ml Av < Ml lseg{xi,M, nil, 0). 

For convenience, let 

n ::= M = {v}U Ml Av < Mi, 

S ::= xi I—>■ {(next, X), (data, v)} * lseg{X, Mi, nil, 0). 

The first application of the rule Lemma. Since root{L) ::= xi 1 — 
{(next, X), (data, u)}, the rule Lemma generates a sub-goal 7Ti A E'l 
3 X,v. root{L), where E{ ::= xi 1—>■ {(next, X2), (data, wi)}. Then the rule 
MatchI is applied, resulting in a quantifier instantiation 771 = {X -A X2,v -A 
ti}. Note that, since £0(771 |free(ax,-u.r-oot(L))) ”= true, the entailment 7Ti A 
EQ(i7i |{x,«}) H EQ(77 i |free( 3 X,u.root(L))) holds. The variable Substitution 771 is used 
to instantiate the existentially quantified variables in the remaining part of the 
lemma, that is, 77 A lseg{X, Mi, nil,0), resulting into the formula 

771(77 A lseg{X, Mi, nil, 0)) ::= 

M = {ui} U Ml Avi < Ml A lseg{x2. Ml, nil, 0). 
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Then, the rule Lemma generates another sub-goal 7Ti AX’" 3M, Mi. ?7i(7TA 

lseg{X, Ml, nil, 0)), where 17" ::= X 2 >->• {(next, nil), (data, ^ 2 )}. 

The second application of the rule Slice. For the sub-goal Ui A S" |=^2 
3M,Mi. rii{IIAlseg{X,Mi,n\\,$)), the rule Slice is applied again. Since there is 
a single spatial atom in the RHS, the rule Slice generates a sub-goal 7 Ti Al7" |=^3 
3Mi. lseg{x 2 ,Mi,n\\,<l>). 

The second application of the rule Lemma. For the sub-goal Ui A S” |=^3 
3Mi. Iseg{x 2 , Ml, nil,0), the rule Lemma is applied again, using the lemma L' 
(still corresponding to the inductive rule of Iseg), 

L' ::= 3X', M[,v'. X 2 >->■ {(next, X'), (data, u')} * lseg{X', M[, nil, 0)A 

Ml = {u'l U M'l Av' < M[ => lseg{x 2 , Mi, nil, 0). 


For convenience, let 

n' ::= Ml = {u'} U M{ A < M{, 

s' X2 ^ {(next, X'), (data, w^)} * lseg{X', M(, nil, 0 ). 

Since root{L') ::= X2 >—>■ {(next, X'), (data, w')}, the rule Lemma generates a 
sub-goal 7Ti A S'{ \=^y^ 3X',v'. root{L'). Then the rule MatchI is applied, 
resulting in a quantifier instantiation rj'i = {X' —>■ nil,u' -A U2}. Note that, 
since EQ{r]'i\f,ree{ 3 X',v'.root{L'))) ■■= true, the entailment 7Ti A EQ(77{ h 
EQ(i?i|fr6e(aA:',D'.root(L'))) holds. The variable substitution lyj is used to instan¬ 
tiate the existentially quantified variables in the remaining part of the lemma, 
that is, n' A lseg{X', M[, nil, 0 ), resulting into the formula 

ri'i{n' A IsegiyX', M{, nil, 0 )) :;= 

Ml = {U2} U M'l Av 2 < M[ a lseg{n\\, M[, nil, 0 ). 

Then, the rule Lemma generates another sub-goal ili A emp |=,j' 
3Mi,M[. ri'i{n' A lseg{X', M{, nil, 0 )). 

The third application of the rule Slice. For the sub-goal 77 i A emp |=,j' 
3Mi,M[. rj'^{n' A foe(7(X', M{, nil, 0 )), since there is only one spatial atom 
lseg{r\\\, M(, nil, 0 ) in the RHS, the rule Slice generates a subgoal 7 Ti A emp 
3M[. lseg{n[\, M[, nil, 0 ), for which the rule Lemma is applied, with a lemma L" 
corresponding to the base rule of Iseg, i.e., eq. ([H) (pageHH, 

L" ::= nil = nil A emp A M{ = 0 => feeg(nil, M{, nil, 0 ). 

The third application of the rule Lemma. Since root{L") ::= emp, the 
rule Lemma generates a sub-goal Ui A emp root{L"). Then the rule 

MatchI is applied, resulting in a quantifier instantiation 77" = 0 . Note 
that, since EQ(? 7 "|fj.e 6 (root(L"))) true, the entailment 77 i A EQ(77"|0) ^ 
EQ(i?i |free(root(L"))) holds. The Substitution 77" is used to instantiate the ex¬ 
istential variables in the remaining part of the lemma, that is, nil = nil AM{ = 0 , 
resulting into the same formula. Then the rule Lemma generates a sub-goal 
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III H»)2 nil = nil A = 0 , which holds clearly with rj'^ = 0 . Finally, 

rji U 772 is extended with nil = nil A = 0 , resulting into r/g = {M[ 0 }. 

The third application of the rule Slice (continued). The variable sub¬ 
stitution 773 is extended with Mi = {V2} U M[ A V2 < M{, resulting into 
77' = {Ml —> {^2} U 0 , M{ -5- 0 }. Then the rule Slice generates a sub-goal 
7 TiAEQ(77Q 1 = Ml = {v2}UM{Aw2 < M{. Because £0(772) = -^1 = {w2}U0AM( = 
0 , we know that £0(773) 1 = -^1 = {’^2} U M{ Av2 < M[. Thus the sub-goal holds. 
The second application of the rule Lemma (continued). The variable sub¬ 
stitution 77^ U 772 should be extended with 77 ' ::= Mi = {7;'} U M{ Av' < M[. 
Since Mi G dom( 772 ), the extension makes no effect. Then 773 = (77^ U 77 Q|{Mi} = 
[Ml {W2} U 0 }. 

The second application of the rule Slice (continued). The variable sub¬ 
stitution 773 is extended with M = {m} U Mi Avi < Mi , resulting into 772 = 
{M —> {wi}U{w2}U0, Ml —> {v2}U0}. Then the rule Slice generates a sub-goal 
7 TiAE0(772) \= M = {vijUMiAwi < Mi. Since 7 Ti ::= xi ^ nilAa;2 ^ nilAvi < V2 
and £0(772) ::= M = {iii} U {^2} U 0 A Mi = {712} U 0 , the sub-goal holds. 

The first application of the rule Lemma (continued). The variable substi¬ 
tution 771 U 772 should be extended with 77 ::= M = {v} U Mi Av< Mi. Since 
M G dom(77i U772), this extension makes no effect. Then 77 is obtained from 771U772 
by restricting to {M}. So 77 = {M —> {m} U {712} U 0 }. 

The first application of the rule Slice (continued). The variable substi¬ 
tution 77 is extended with II2 ::= V2 G M, and still getting 77. Finally, Slice 
generates the sub-goal 77 i A £0(77) \= 112 . Since £0(77) ::= M = {tii} U { V2 } U 0 , 
the entailment £0(77) |= II2 holds. Therefore, the sub-goal 77 i A£0(77) |= 7T2 holds 
as well. 



